Goto

Collaborating Authors

 computational method


TusoAI: Agentic Optimization for Scientific Methods

arXiv.org Artificial Intelligence

Scientific discovery is often slowed by the manual development of computational tools needed to analyze complex experimental data. Building such tools is costly and time-consuming because scientists must iteratively review literature, test modeling and scientific assumptions against empirical data, and implement these insights into efficient software. Large language models (LLMs) have demonstrated strong capabilities in synthesizing literature, reasoning with empirical data, and generating domain-specific code, offering new opportunities to accelerate computational method development. Existing LLM-based systems either focus on performing scientific analyses using existing computational methods or on developing computational methods or models for general machine learning without effectively integrating the often unstructured knowledge specific to scientific domains. Here, we introduce TusoAI , an agentic AI system that takes a scientific task description with an evaluation function and autonomously develops and optimizes computational methods for the application. TusoAI integrates domain knowledge into a knowledge tree representation and performs iterative, domain-specific optimization and model diagnosis, improving performance over a pool of candidate solutions. We conducted comprehensive benchmark evaluations demonstrating that TusoAI outperforms state-of-the-art expert methods, MLE agents, and scientific AI agents across diverse tasks, such as single-cell RNA-seq data denoising and satellite-based earth monitoring. Applying TusoAI to two key open problems in genetics improved existing computational methods and uncovered novel biology, including 9 new associations between autoimmune diseases and T cell subtypes and 7 previously unreported links between disease variants linked to their target genes. Our code is publicly available at https://github.com/Alistair-Turcan/TusoAI.


Ethical Considerations of Large Language Models in Game Playing

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated tremendous potential in game playing, while little attention has been paid to their ethical implications in those contexts. This work investigates and analyses the ethical considerations of applying LLMs in game playing, using Werewolf, also known as Mafia, as a case study. Gender bias, which affects game fairness and player experience, has been observed from the behaviour of LLMs. Some roles, such as the Guard and Werewolf, are more sensitive than others to gender information, presented as a higher degree of behavioural change. We further examine scenarios in which gender information is implicitly conveyed through names, revealing that LLMs still exhibit discriminatory tendencies even in the absence of explicit gender labels. This research showcases the importance of developing fair and ethical LLMs. Beyond our research findings, we discuss the challenges and opportunities that lie ahead in this field, emphasising the need for diving deeper into the ethical implications of LLMs in gaming and other interactive domains.


Exploring the Technical Knowledge Interaction of Global Digital Humanities: Three-decade Evidence from Bibliometric-based perspectives

arXiv.org Artificial Intelligence

Digital Humanities (DH) is an interdisciplinary field that integrates computational methods with humanities scholarship to investigate innovative topics. Each academic discipline follows a unique developmental path shaped by the topics researchers investigate and the methods they employ. With the help of bibliometric analysis, most of previous studies have examined DH across multiple dimensions such as research hotspots, co-author networks, and institutional rankings. However, these studies have often been limited in their ability to provide deep insights into the current state of technological advancements and topic development in DH. As a result, their conclusions tend to remain superficial or lack interpretability in understanding how methods and topics interrelate in the field. To address this gap, this study introduced a new concept of Topic-Method Composition (TMC), which refers to a hybrid knowledge structure generated by the co-occurrence of specific research topics and the corresponding method. Especially by analyzing the interaction between TMCs, we can see more clearly the intersection and integration of digital technology and humanistic subjects in DH. Moreover, this study developed a TMC-based workflow combining bibliometric analysis, topic modeling, and network analysis to analyze the development characteristics and patterns of research disciplines. By applying this workflow to large-scale bibliometric data, it enables a detailed view of the knowledge structures, providing a tool adaptable to other fields.


Computationally Intensive Research: Advancing a Role for Secondary Analysis of Qualitative Data

arXiv.org Artificial Intelligence

This paper draws attention to the potential of computational methods in reworking data generated in past qualitative studies. While qualitative inquiries often produce rich data through rigorous and resource-intensive processes, much of this data usually remains unused. In this paper, we first make a general case for secondary analysis of qualitative data by discussing its benefits, distinctions, and epistemological aspects. We then argue for opportunities with computationally intensive secondary analysis, highlighting the possibility of drawing on data assemblages spanning multiple contexts and timeframes to address cross-contextual and longitudinal research phenomena and questions. We propose a scheme to perform computationally intensive secondary analysis and advance ideas on how this approach can help facilitate the development of innovative research designs. Finally, we enumerate some key challenges and ongoing concerns associated with qualitative data sharing and reuse.


Computational methods for Dynamic Answer Set Programming

arXiv.org Artificial Intelligence

In our daily lives and industrial settings, we often encounter dynamic problems that require reasoning over time and metric constraints. These include tasks such as scheduling, routing, and production sequencing. Dynamic logics have traditionally addressed these needs but often lack the flexibility and integration required for comprehensive problem modeling. This research aims to extend Answer Set Programming (ASP), a powerful declarative problem-solving approach, to handle dynamic domains effectively. By integrating concepts from dynamic, temporal, and metric logics into ASP, we seek to develop robust systems capable of modeling complex dynamic problems and performing efficient reasoning tasks, thereby enhancing ASPs applicability in industrial contexts.


Boundary-Decoder network for inverse prediction of capacitor electrostatic analysis

arXiv.org Artificial Intelligence

Traditional electrostatic simulation are meshed-based methods which convert partial differential equations into an algebraic system of equations and their solutions are approximated through numerical methods. These methods are time consuming and any changes in their initial or boundary conditions will require solving the numerical problem again. Newer computational methods such as the physics informed neural net (PINN) similarly require re-training when boundary conditions changes. In this work, we propose an end-to-end deep learning approach to model parameter changes to the boundary conditions. The proposed method is demonstrated on the test problem of a long air-filled capacitor structure. The proposed approach is compared to plain vanilla deep learning (NN) and PINN. It is shown that our method can significantly outperform both NN and PINN under dynamic boundary condition as well as retaining its full capability as a forward model.


A Computational Method for Measuring "Open Codes" in Qualitative Analysis

arXiv.org Artificial Intelligence

Qualitative analysis is critical to understanding human datasets in many social science disciplines. Open coding is an inductive qualitative process that identifies and interprets "open codes" from datasets. Yet, meeting methodological expectations (such as "as exhaustive as possible") can be challenging. While many machine learning (ML)/generative AI (GAI) studies have attempted to support open coding, few have systematically measured or evaluated GAI outcomes, increasing potential bias risks. Building on Grounded Theory and Thematic Analysis theories, we present a computational method to measure and identify potential biases from "open codes" systematically. Instead of operationalizing human expert results as the "ground truth," our method is built upon a team-based approach between human and machine coders. We experiment with two HCI datasets to establish this method's reliability by 1) comparing it with human analysis, and 2) analyzing its output stability. We present evidence-based suggestions and example workflows for ML/GAI to support open coding.


Artificial Intelligence for Microbiology and Microbiome Research

arXiv.org Machine Learning

Advancements in artificial intelligence (AI) have transformed many scientific fields, with microbiology and microbiome research now experiencing significant breakthroughs through machine learning and deep learning applications. This review provides a comprehensive overview of AI-driven approaches tailored for microbiology and microbiome studies, emphasizing both technical advancements and biological insights. We begin with an introduction to foundational AI techniques, including primary machine learning paradigms and various deep learning architectures, and offer guidance on choosing between machine learning and deep learning methods based on specific research goals. The primary section on application scenarios spans diverse research areas, from taxonomic profiling, functional annotation & prediction, microbe-X interactions, microbial ecology, metabolic modeling, precision nutrition, clinical microbiology, to prevention & therapeutics. Finally, we discuss challenges unique to this field, including the balance between interpretability and complexity, the "small n, large p" problem, and the critical need for standardized benchmarking datasets to validate and compare models. Together, this review underscores AI's transformative role in microbiology and microbiome research, paving the way for innovative methodologies and applications that enhance our understanding of microbial life and its impact on our planet and our health.


Building a Language-Learning Game for Brazilian Indigenous Languages: A Case of Study

arXiv.org Artificial Intelligence

We discuss in detail the challenges of building a Language learning games are key tools to vitalize language learning tool for BIL, such as the lack of endangered languages (Thomason, 2015; Xu et al., written and phonetical resources, ethical concerns 2022; Neubig et al., 2020). LARA (Akhlaghi et al., on available treebanks and databases used for exercise 2019), a multi language learning assistant, is an generation, and provide some suggestions example that has been key to support actions related on steps forward. We managed to build a minimal to endangered languages protection (Rayner proof of concept course for Guajajara language divided and Wilmoth, 2023; Bédi et al., 2022; Zuckermann in two sections. We employed dependency et al., 2021). Despite the necessity of language treebanks and a lexical database on BIL as source learning tools to vitalize endangered languages, for exercise generation. The main contribution of they are typically restricted to high-resource languages, this work is to present a case of study on building a such as english, and require significant language learning tool for BIL and, we hope, it will effort to be extended to languages with few spoken serve as an starting point for the development of and written resources.


Epigenomics Now

Communications of the ACM

For nearly a quarter-century, we have had a (mostly) complete listing of the human genome, the three-billion "letter" sequence of DNA, most of which is the same for all of us. This reference copy makes it much easier for scientists to understand biological processes and to identify the individual variations, such as mutations, that contribute to disease. Despite its central role and its extreme usefulness, however, the genome's impact on healthcare has been smaller than many proponents had hoped. Part of the reason is that while most of the cells in the human body carry identical DNA, the biological activity of different regions varies widely over time and between different tissues. It is these differences in gene expression that orchestrate the intricate development of tissues and the unique features of various cell types, as well as much of the misbehavior of cells in disease.